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1 Programming languages for distributed computing systems 
Henri E. Bal, Jennifer G. Steiner, Andrew S. Tanenbaum 
September 1989 ACM Computing Surveys (CSUR), volume 21 issue 3 

Full text available: ^ pdf(6.50 MB) Additional Information: full citation , abstract , references , ci 

When distributed systems first appeared, they were programmed in traditional sequential languag 
procedures for sending and receiving messages. As distributed applications became more commor 
approach became less satisfactory. Researchers all over the world began designing new programs 
distributed applications. These languages and their history, their underlying pr ... 

2 Simultaneous reference allocation in code generation for dual data memory bank ASIPs 
Ashok Sudarsanam, Sharad Malik 

April 2000 ACM Transactions on Design Automation of Electronic Systems (TODAES), \ 

Full text available: ^ pdf(1 56.30 KB) Additional Information: full citation , abstract , references , ci 

We address the problem of code generation for DSP systems on a chip. In such systems, the amoi 
so application software must be sufficiently dense. Additionally, the software must be written so a: 
which may include hard real-time constraints. Unfortunately, current compiler technology is unabh 
architectures are highly irregular. Thus, designers often r ... 



Keywords: code generation, code optimization, graph labelling, memory bank assignment, regist 



3 Compiler transformations for high-performance computing 
David F. Bacon, Susan L. Graham, Oliver J. Sharp 
December 1994 ACM Computing Surveys (CSUR), volume 26 issue 4 

Full text available: ^ pdf(6.32 MB) Additional Information: full citation , abstract , references , ci 

In the last three decades a large number of compiler transformations for optimizing programs hav 
uniprocessors reduce the number of instructions executed by the program using transformations t 
flow techniques. In contrast, optimizations for high-performance superscalar, vector, and parallel | 
locality with transformations that rely on tracking the properties o ... 

Keywords: compilation, dependence analysis, locality, multiprocessors, optimization, parallelism, 



Design tradeoffs for the Alpha EV8 conditional branch predictor 

Andre Seznec, Stephen Felix, Venkata Krishnan, Yiannakis Sazeides 

May 2002 ACM SIGARCH Computer Architecture News, volume 30 issue 2 

Full text available: ^ pdf(124JMB) (fl Publisher Site Additional Information: full citation , abstract , references , ci 

This paper presents the Alpha EV8 conditional branch predictor The Alpha EV8 microprocessor pro; 
development, envisioned an aggressive 8-wide issue out-of-order superscalar microarchitecture fe 



multithreading. Performance of such a processor is highly dependent on the accuracy of its branch 
area was devoted to branch prediction on EV8. The Alpha EV8 branch pre ... 



Keywords: EV8 processor, Branch Prediction 



5 The effect of instruction fetch bandwidth on value prediction 
Freddy Gabbay, Avi Mendelson 

April 1998 ACM SIGARCH Computer Architecture News , Proceedings of the 25th annu; 

architecture, volume 26 issue 3 
Full text available: ^ p d f^ 32 mb ) (f l Publisher Site Additional Information: full citation , abstract , references , ci 

Value prediction attempts to eliminate true-data dependencies by dynamically predicting the outcc 
dependent instructions based on that prediction. In this paper we attempt to understand the limit* 
We show that the instruction-fetch bandwidth and the issue rate have a very significant impact on 
study how recent techniques to improve the instructio ... 

6 Data and memory optimization techniques for embedded systems 

P. R. Panda, F. Catthoor, N. D. Dutt, K. Danckaert, E. Brockmeyer, C. Kulkarni, A. Vandercappelle, P 
April 2001 ACM Transactions on Design Automation of Electronic Systems (TODAES), \ 

Full text available: ^ pdf(339.91 KB) Additional Information: full citation , abstract , references , ci 

We present a survey of the state-of-the-art techniques used in performing data and memory-relat 
optimizations are targeted directly or indirectly at the memory subsystem, and impact one or mor 
performance, and power dissipation of the resulting implementation. We first examine architecture 
transoformations. We next cover a broad spectrum of optimizati ... 

Keywords: DRAM, SRAM, address generation, allocation, architecture exploration, code transfornr 
synthesis, memory architecture customization, memory power dissipation, register file, size estim; 



Converting thread-level parallelism to instruction-level parallelism via simultaneous multithre 
Jack L. Lo, Joel S. Emer, Henry M. Levy, Rebecca L Stamm, Dean M. Tullsen, S. J. Eggers 
August 1997 ACM Transactions on Computer Systems (TOCS), Volume is issue 3 

Full text available: || pdf( 526.39 KB ) Additional Information: full citation , abstract , references , ci 

To achieve high performance, contemporary computer systems rely on two forms of parallelism: ir 
parallelism (TLP) . Wide-issue super-scalar processors exploit ILP by executing multiple instruction: 
Multiprocessors (MP) exploit TLP by executing different threads in parallel on different processors, 
statically partition processor resources, thus preventing t ... 

Keywords: cache interference, instruction-level parallelism, multiprocessors, multithreading, simi 



A general framework for prefetch scheduling in linked data structures and its application to n 

Seungryul Choi, Nicholas Kohout, Sumit Pamnani, Dongkeun Kim, Donald Yeung 

May 2004 ACM Transactions on Computer Systems (TOCS), volume 22 issue 2 

Full text available: || pdf(2.45 MB) Additional Information: full citation , abstract , references , ir 

Pointer-chasing applications tend to traverse composite data structures consisting of multiple inde 
single pointer chain leads to the serialization of memory operations, the traversal of independent | 
parallelism. This article investigates exploiting such interchain memory parallelism for the purpose 
called multi-chain prefetching. Previous work ... 

Keywords: Data prefetching, memory parallelism, pointer-chasing code 



TRIPS: A polymorphous architecture for exploiting ILP. TLP . and DLP 

Karthikeyan Sankaralingam, Ramadass Nagarajan, Haiming Liu, Changkyu Kim, Jaehyuk Huh, Nitya 

Robert G. McDonald, Charles R. Moore 

March 2004 ACM Transactions on Architecture and Code Optimization (TACO), Volume 1 iss 
Full text available: ||| pdf(832.3Q KB) Additional Information: full citation , abstract , references , ir 



This paper describes the polymorphous TRIPS architecture that can be configured for different gra 
architecture is the first in a class of post-RISC, dataflow-like instruction sets called explicit data-gr 
with hardware mechanisms that enable the processing cores and the on-chip memory system to b 
instruction, data, or thread-level parallelism. To adapt ... 

Keywords: Computer architecture, configurable computing, scalable and high-performance comp 



10 The white dwarf: a hig h- performance application-specific processor 

A. Wolfe, M. Breternitz, C. Stephens, A. L. Ting, D. B. Kirk, R. P. Bianchini, J. P. Shen 

May 1988 ACM SIGARCH Computer Architecture News , Proceedings of the 15th Annu; 

architecture, volume 16 issue 2 
Full text available: ^ pdf(1.40 MB) Additional Information: full citation , abstract , references , ci 

This paper presents the design and implementation of a high-performance special-purpose proces: 
element analysis algorithms. The White Dwarf CPU contains two Am29325 32-bit floating-point pn 
employs a wide-instruction word architecture in which the application algorithm is directly implenrn 
compatible and interfaces with a SUN 31160 host. The syste ... 

11 Distributed operating systems 

Andrew S. Tanenbaum, Robbert Van Renesse 

December 1985 ACM Computing Surveys (CSUR), volume 17 issue 4 

Full text available: pdf(5.49 MB) ' Additional Information: full citation , abstract , references , ci 

Distributed operating systems have many aspects in common with centralized ones, but they also 
an introduction to distributed operating systems, and especially to current university research abo 
distributed operating system and how it is distinguished from a computer network, various key de 
of current research projects are examined in some detail ... 

12 Increasing the instruction fetch rate via multiple branch prediction and a branch address cac 
Tse-Yu Yeh, Deborah T. Marr, Yale N. Patt 

August 1993 Proceedings of the 7th international conference on Supercomputing 

Full text available: ^ pdf(1.13 MB) Additional Information: full citation , references , citings , index terms , reviev 



13 A parallel embedded-processor architecture for ATM reassembly 
Richard F. Hobson, P. S. Wong 

February 1999 IEEE/ACM Transactions on Networking (TON), volume 7 issue l 

Full text available: |g| pdf(331.21 KB) Additional Information: full citation , references , citings , index term 



Keywords: ATM, embedded systems, medium access control, segmentation and reassembly 



Embedded applications: AES and the cryptonite crypto processor 
Dino Oliva, Rainer Buchty, Nevin Heintze 

October 2003 Proceedings of the 2003 international conference on Compilers, architectur 

Full text available: ||| pdf(346.09 KB) Additional Information: full citation , abstract , references , ir 

CRYPTONITE is a programmable processor tailored to the needs of crypto algorithms. The design c 
application analysis in which standard crypto algorithms (AES, DES, MD5, SHA-1, etc) were distille 
this methodology and use AES as a central example. Starting with a functional description of AES, 
AES efficiently in hardware, and present several novel optimizations (whic ... 

Keywords: AES, architecture, cryptography, high-bandwidth, high-speed, processor, round key g 



Performance evaluation and improvement of a dynamically microprogrammable computer w 

Shinji Tomita, Kiyoshi Shibayama, Toshiaki Kitamura, Hiroshi Hagiwara 

November 1980 Proceedings of the 13th annual workshop on Microprogramming 



Full text available: ^pdf(1.21 MB) 



Additional Information: full citation , abstract , references , ci 



A new microprogrammable computer with low-level parallelism was built and has been utilized as 
research-oriented applications such as real-time processings on static/dynamic images, pictures ai 
virtual machines including high (intermediate) level language machines. The design goal of a rese; 
high degree of processing power and system flexi ... 

16 The Vector-Thread Architecture 

March 2004 ACM SIGARCH Computer Architecture News , Proceedings of the 31st annus 

architecture, volume 32 issue 2 
Full text available: ^ pdf(317.13 KB) Additional Information: full citation , abstract 

The vector-thread (VT) architectural paradigm unifies the vectorand multithreaded compute mode 
with a control processor and a vector of virtualprocessors (VPs). The control processor can use ve( 
all the VPs or each VP can usethread-fetches to direct its own control flow. A seamless intermixing 
allows a VT architectureto flexibly and compactly encode application ... 

17 Response Time Analysis of Multiprocessor Computers for Database Support 
Roger K. Shultz, Roy J. Zingg 

March 1984 ACM Transactions on Database Systems (TODS), volume 9 issue l 

Full text available: ^ pdf(2.27 MB) Additional Information: full citation , abstract , references , cj 

Comparison of three multiprocessor computer architectures for database support is made possible 
These expressions are derived by parameterizing algorithms performed by each machine to execu 
represent properties of the database and components of the machines. Studies of particular paran 
conventional machine technology, for low selectivity, high duplicate occurrence, ... 

18 Retrieval operations and data representations in a context-addressed disc system 
Stanley Y. W. Su, George P. Copeland, G. Jack Lipovski 

November 1973 Proceedings of the 1973 meeting on Programming languages and informati 

Full text available: ^g| pdf(1.15 MB) Additional Information: full citation , abstract, references, ci 

This paper attempts to demonstrate that simple expansion of the processing capabilities of fixed d 
mappings from high-level retrieval language to machine language and from user oriented data rep 
oriented data representation (storage structure) which are found necessary in conventional von N< 
built in the disc read and write heads for each disc track allow inf ... 

19 S ystem-level power o pt imization: techniques and tools 
Luca Benini, Giovanni de Micheli 

April 2000 ACM Transactions on Design Automation of Electronic Systems (TODAES), v 

Full text available: ^ pdf(385.22 KB) Additional Information: full citation , abstract , references , ci 

This tutorial surveys design methods for energy-efficient system-level design. We consider electro 
software layers. We consider the three major constituents of hardware that consume energy, nam 
units, and we review methods of reducing their energy consumption. We also study models for an; 
for energy-efficient software design and compilation. This survery ... 

20 Ex ploiting choice: instruction fetch and issue on an implementable simultaneous multithread 
Dean M. Tullsen, Susan J. Eggers, Joel S. Emer, Henry M. Levy, Jack L. Lo, Rebecca L. Stamm 

May 1996 ACM SIGARCH Computer Architecture News , Proceedings of the 23rd annu; 

architecture, volume 24 issue 2 
Full text available: ^ pdf(1.48 MB) Additional Information: full citation , abstract , references , ci 

Simultaneous multithreading is a technique that permits multiple independent threads to issue mi 
demonstrated the performance potential of simultaneous multithreading, based on a somewhat id< 
throughput gains from simultaneous multithreading can be achieved without extensive changes to 
hardware structures or sizes. We present an architecture for s ... 
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